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Abstract 

Compressive sensing is a technique to sample signals well below the Nyquist rate using 
linear measurement operators. In this paper we present an algorithm for signal reconstruc- 
tion given such a set of measurements. This algorithm generalises and extends previous 
iterative hard thresholding algorithms and we give sufficient conditions for successful re- 
construction of the original data signal. In addition we show that by underestimating the 
sparsity of the data signal we can increase the success rate of the algorithm. 

We also present a number of modifications to this algorithm: the incorporation of a 
least squares step, polynomial acceleration and an adaptive method for choosing the step- 
length. These modified algorithms converge to the correct solution under similar conditions 
to the original un-modified algorithm. Empirical evidence show that these modifications 
dramatically increase both the success rate and the rate of convergence, and can outperform 
other algorithms previously used for signal reconstruction in compressive sensing. 

1 Introduction 

Compressive sensing is a radical new way of sampling signals at a sub-Nyquist rate. The Shan- 
non/Nyquist sampling theorem tells us that an analogue signal can be reconstructed perfectly 
from its samples, if it was sampled at a rate at least twice the highest frequency present in the 
signal, known as the Nyquist rate [251]29j . For many signals, such as audio or images, the Nyquist 
rate can be very high. This may result in acquiring a very large number of samples, which must 
be compressed in order to store or transmit them, as well as placing a high requirement on the 
equipment needed to sample the signal. Compressive Sensing (or compressed sensing or CS) is a 
recently introduced method that can reduce the number of measurements required, in some ways 
it can be regarded as automatically compressing the signal. Compressive sensing is a technique 
that enables us to fully reconstruct particular classes of signals if the original signal, is sampled 
at a rate well below the Nyquist rate. 

In particular, compressive sensing works with sparse signals. In many applications the signal 
of interest is primarily zero in some known fixed basis, that is, in this representation the data 
contained within the signal is sparse. Traditional measurement techniques which consist of 
sampling at all the possible data points heavily over-sample the signal. Consider the scenario 
where we randomly draw samples from a sparse signal, then the probability of sampling at an 
"interesting" data point is equal to the sparsity fraction. Compressive sensing gets around this 
by using a small number of linear sampling operators that samples the signal across all data 
points simultaneously. This gives rise to the name compressive sensing as it is a combination of 
sampling and compression. 
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More formally, let x € R" be an s-sparse vector in the basis then we can write x = Vfs 
for some vector s S M" such that only s ^ n components of s are non-zero. Assume then that 
our m linear sampling operators are given by the rows of the matrix $ e R™^". The problem 
of compressive sensing is then to find x, or equivalently, s given the measurements y = <i>x and 
the matrices $ and 5*, under the assumption that s is sparse. If to < n then the problem is 
under-determined and there is no unique solution, so we commonly say that we are interested in 
the most sparse solution to this equation, i.e. the minimiser to 

min pIL subject to y = ^'I's. 

To solve this problem in general is known to be NP-hard p!7ll26j. 

One of the original breakthroughs in compressive sensing was to show that linear program- 
ming methods can be used to efficiently reconstruct the data signal with high accuracy pnil2|jl9j . 
Since then many alternative methods have been proposed as a faster or more successful alterna- 
tive to these linear programming algorithms. One approach is to use matching pursuit techniques, 
originally proposed in [5S], variations have been proposed such as OMP or orthogonal match- 
ing pursuit |30j . Stagewise orthogonal matching pursuit (StOMP) ^18) . Compressive sampling 
matching pursuit (CoSaMP) [27| and gradient pursuit algorithms [S1I31[71[H] • Also proposed has 
been a suite of thresholding based algorithms, either hard thresholding (IHT) [SP^ or soft thresh- 
olding [151116] . What we propose is a combination of some of these techniques, combining hard 
thresholding with matching pursuit methods. We will also show how we can use polynomial 
acceleration techniques to increase the rate of convergence. 

In addition to this, work has also been done on model based compressive sensing in 2J, which 
can be applied to many of the algorithms above. Most of the aforementioned algorithms, in 
particular CoSaMP and IHT make use of a pruning step which takes a solution and forces it 
to be sparse, by removing all but the s-largest (in magnitude) components, which is the best 
s sparse approximation under any £p norm for 1 ^ p < oo. Model based compressive sensing 
proposes using a model based algorithm to perform this, that is to choose the sparse signal 
that is not necessarily closest under an ip distance, but that best fits the signal model. Such a 
modification would also be applicable to our algorithm. 

We will refer to our algorithm as the "Modified Frame Reconstruction" or MFR algorithm. 

1.1 Paper Overview 

In section [2] of this paper we will give an overview of the fundamental concepts in compressive 
sensing. We will also briefly discuss frames, the frame reconstruction algorithm and polynomial 
acceleration using Chebyshev polynomials. In section [3] we will present our own algorithm and 
theoretical results regarding its performance. This work is similar to that in OH], but we analyse 
the algorithm in a different manner and in a more general setting. Later in section U] we present 
a number of extensions to this algorithm that increase both the rate of convergence and the 
probability of successfully estimating the original input signal. We show that by incorporating a 
least squares step, polynomial acceleration and a variable step-length, that we can significantly 
increase the performance of our algorithm. In section [5] we perform simulations of our algorithm 
demonstrating that the theoretical results translate into a real world performance advantage. 
Section El contains a comparison of our algorithm to various other existing methods. 

1.2 Notation 

In this paper bold-faced symbols represent vectors, such as x g R", with components xi, . . . , 
For an index set F C {1, 2, . . . , n} we take xr G R" to be the vector that agrees with x on 
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all the indices i G T and is elsewhere. For a matrix <i> € ]^rnxn ^^^^ $t^^* g^^^ to 
be the transpose, complex-conjugate transpose and Moore-Penrose pseudo-inverse respectively. 
Hence $ty jg the solution to the minimisation problem argmiux ||$x — y||2. For a matrix $ 
we take $r to be the submatrix formed by taking the columns of $ indicated by the set F. 
When we write $J we apply the column selection first, then the transpose, e.g. <i>f = (<&r)^- 
We take Ainin($), Aniax(*i') to be the lower and upper eigenvalues of the matrix <i> and similarly 
o'max(*i'), o'niin($) are the upper and lower singular values of 

Recall that the £p distance, for 1 ^ p < oo, between two vectors x, y G M" is defined to 
be d{x,y) = {J27=i - Uil'^f^^ giving rise to the £p norm ||x||p = {J27=i \xt\^f^^- The io 
pseudo-norm is given by ||x||q ~ \{xi 7^ 0}|, i.e. the number of non-zero components of x. We 
say that a vector x is s-sparse if ||x||q = s, that is, precisely s components of x are non-zero. 
The i-th stage of an iterative algorithm producing the vector x is denoted by x'*^ . 

We will write Mr : K" R" to be the function that returns x, a best r-sparse vector 
approximation to the input x under any ^p-norm, 1 ^ p < 00, i.e. a r-sparse vector such that 
||x — x||^ is minimal. Note that this has the same solution for any I ^ p < 00: the vector x 
consisting of the r-largest (in magnitude) components of x. If this vector is not unique, we can 
decide whether to break the tie randomly or deterministically. In our implementation we chose 
to break the tie lexicographically. We also write supp(x) to denote the support of the vector x, 
that is, supp(x) = {xi : Xi 7^ 0}. 



2 Background 

2.1 Compressive Sensing 

Let X e M" be a signal and let <I> g ^^x" be a measurement matrix. We call the vector y e R™ 
the vector of observations if y = <&x -I- e where e € R™ is a vector of noise or errors, possibly 
equal to 0. The task is to recover x given only the observations y and the matrix <&. Clearly if 
e — m — n and the columns of $ are linearly independent, then $ is invertible and x can be 
recovered exactly. However compressive sensing asks (and answers) the question, how well can 
we do if m nl 

First, let us define the Restricted Isometry Property (RIP) condition of order s for a matrix 
$ e R™x" as per jlO]. Let (5^ ^ be the smallest value such that 

l-6s^^^^l + S. (1) 

l|x||2 

for all s-sparse vectors. We call ds the restricted isometry constant of order s. If 5s < 1 we say 
that <I> satisfies the RIP of order s. It is easy to see that we have 

Amin ($^$) 1 - (5, < 1 + ^ A„,ax (*^$) , 

for any s. For a given matrix $ the restricted isometry constant Ss can be calculated by 

Ss^ max {1 - Amin ($?$r) , Amax (*f$r) - 1} ■ 
T: |ri=s 

This is proved in Lemma [51 

In order to give some sense to the problem of finding a vector x, given the measurements y 
so that y = $x, we make the assumption that x is sparse. We can then cast this question as a 
minimisation problem 

arg min ||x|L subject to y = <I>x. 
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The following lemma from [T^] gives conditions under which the minimiser to the £o minimisation 
problem above is the same as the solution to y = <i>x. 

Lemma 1 (Lemma 1.2 of [E])- Let $ G M™^" be a matrix with RIP constant 62s < 1 and let 
r be an index set with \T\ ^ s. Let x G M" be a vector with support T and set y = <i>x. Then x 
is the unique solution to 

arg min |jx||Q subject to y = $x, 

and hence x can be reconstructed exactly from y. 

Lemma [T] only proves the existence of a solution but does not say anything about how to find 

it. 

One question is, what matrices <& satisfy the RIP of order s? In general, it is not possible to 
design a matrix $ that satisfies the RIP of a particular order. It has been shown that certain 
classes of random matrices will obey the RIP with very high probability. For example, let the 
columns of $ be sampled uniformly at random from the unit sphere, or let each entry <I>y be 
sampled independently from the Gaussian distribution with mean and variance Then $ 
obeys the RIP of order s with probability at least 1 — 0{e~"^) provided that m ^ Cslog(n/s) 
for some constant C. As a rule of thumb, for a Gaussian measurement matrix it suffices to take 
TO « 2s log n so that £1 minimisation will work. [T51[51[^ 

2.2 Frames 

Frames are a generalisation of bases for Hilbert spaces and have been heavily used in wavelet 
decompositions |14l24j . Let be an m-dimensional Hilbert space and let $ — {(pi £ Jf: 1 ^ i ^ 
n} be a set of n elements. Then $ forms a frame for if there exist constants Q < A ^ B < 00 
such that 

n 

1=1 

for all X G and where \\-\\ ^ and (•, •)^ are the norm and scalar-product defined on the Hilbert 
space. The function S : Jff — > given by 

Sx^Y^ {x,ipj)^ipj, 

is called the frame operator. Then given the vector y of observations where yi = (x, <^i) ^ we 
can reconstruct the original element x G M' via the iterative algorithm 

2;(^+^)=2;« + ^5(x-xW), (3) 

where A and B are the frame bounds in ([2]) |24j . Note that although x is unknown, we know 5*2; 
from the measurement vector y and hence this algorithm can be implemented since S is linear. 
This algorithm will converge from any starting element x G M' . In particular if a;*^'^) = 0, then 
the error satisfies 

^ f ^?_4 V l|x|| ^ . 
\B^ A) " "-^ 

Faster algorithms also exist, which can be found, for example, in [23j. 

It was shown in [23 that using any positive value 7 ^ 2/ (_B + A) in place of 2/ (i? + A) will 
also result in correct reconstruction, but at a possibly slower rate of convergence. 

This algorithm is identical to some of those used in iteratively solving least squares problems, 
such as Richardson's first order method [TJ pp276-280]. 
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2.3 Polynomial Acceleration 

Iterative algorithms can often be sped up by considering the output of all the previous iterations, 
not just the very last iteration. Semi-iterative methods such as polynomial acceleration |2HI22j 
and Richardson's second order method [2 pp280-282] are two ways of doing this. The idea behind 
these two methods is to use the solution from previous iterations for example, if we use the update 

xW=X:cMX«, E^M = 1- 

1 = i=0 

Define the polynomial Pk{t) by 

k 
i=0 

and it follows that the error equation is 



xW-x = Pfe(S) (x(°)-x 



where Pk{B) is a polynomial in the matrix _B, hence the name polynomial acceleration. The 
convergence rate is bounded above by 

max \Pk{t)\, 

and so we wish to find the minimiser to 

min max |Pfc(<)| . 

Pfc:dcgPfc=fcte[0,l] 

The minimising class of polynomials for this term is the Chebyshev polynomials of the first 
kind |31j , defined by the recurrence relationship 

Tn+i{x) = 2xTn{x) - T„_i(x), 

with Tq(x) = 1 and Ti{x) = x. It is shown in [21] that we do not need to use all the previous 
iterates, as the polynomials are generated recursively it suffices to use 



and 
where 



(fc+1) ^ ^{k-D + (^^A^ (y - Ax«l + x^'^) - x^'^-i) 



x^ ' ' = x^ ' + LO^ ' ' I 7/1 I y 



l-cj(fe)i^ h + a 

and where a, h are the minimum and maximum eigenvalues of A. This particular method 
is known as the Chebyshev semi-iterative method. The second order Richardson method differs 
only slightly in that we take 

^ = ^W = g . 

In fact it can be shown that in the Chebyshev method we have a;^'^) lo as k 00. These 
algorithms typically converge an order of magnitude faster than without the acceleration [ij. 
In [231 it is also shown that this technique leads to significantly faster convergence of the frame 
reconstruction algorithm. 
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3 Reconstruction Algorithm 



3.1 MFR — Modified Frame Reconstruction Algorithm 

Our approach is based on the frame reconstruction algorithm in ([3]). The key observation for 
MFR is the fact that we can still perform this iterative step even if the matrix "if no longer 
forms a frame for the space M". And that if the algorithm converges, it can still converge to a 
solution X of the now under-determined matrix equation y — $x, which is exactly the problem 
of compressive sensing. 

We give the plain version of the MFR algorithm (Algorithm [T]) and then show several modi- 
fications which increase both the rate of convergence and the success rate in finding the original 
sparse vector. We will also show theoretical bounds for the convergence and give sufficient 
conditions for convergence to occur. 

The algorithm consists of two parts, an update and a thresholding step. 

1. Update: Similar to the frame algorithm we perform an update 

aCc+i) =x('=)-h7$T(y-$x«) , (4) 

where y is the vector of measurements, $ is the measurement matrix and 7 is a control 
parameter which we often refer to as the step-length. 

2. Thresholding: The second part of the algorithm is the thresholding procedure where 
we generate the next "solution" 



(5) 

Here we simply threshold the output of the frame step producing an s-sparse approximation. 

Recall that ]HIs(z) produces the best s-sparse approximation to the input z under any £p norm 
for 1 < p < cx). These two steps are repeated until "convergence" occurs, that is, the change 
from one iteration to the next is sufficiently small. 



3.2 Analysis of Performance 

We now state the properties of the MFR algorithm in Theorem[5]and Proposition[31 Proposition 
[3] gives sufficient convergence conditions in the scenario where we measure a sparse signal with 
noise, i.e. in the model y = $x -I- e where $ e R™^" is the measurement matrix, x G R" is the 
s-sparse data, e S R™ is noise and y S E™ is the observed signal. We will see in Theorem [5] that 
the MFR algorithm converges even when the signal is not sparse. The proof of these results uses 
the same techniques as the proof in [H] although we choose to use the RIP condition rather than 
the modified RIP condition. 

Theorem 2. Fix s — s where s is the sparsity of the desired solution. Then given measurements 
y = $x + e where $ € R^x" fias the RIP such that either condition (a), (b) or (c) is satisfied 

^ ^ 1 \ — TT' '^^^ ^^S'' ^ °^ 

Oss - 02s + 1 v32 



W T < T \ -T' - ^2s) >l- 



5zs-52s + r ' \/32 
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Algorithm 1 Modified Frame Reconstruction Algoritlim 
Input: 

• Tlic measurement matrix $. 

• Observation vector y. 

• Estimate of sparsity s of the vector x. 

• Step size 7. 

• Tolerance parameter etoi- 
Output: 

• A vector x e M" that is s-sparse. 

1: x(") ^ 

2: fc ^ 1 

3: virhile ||x('^) — x^'^'^^ll^ > Stoi do 

4. x(fc+i) ^ jj. ^^{k) ^^T _ ^x^*^))] 

5: k ^ k + 1 

6: end while 
7: return x*^*^^ 



i/ien Algorithm\J\will recover an approximation x^*"') satisfying 



xC^) - x^ 



^2-'=||x^-||2+47v/l + <52s||eil2 + 



where x^'' is i/ie &esi 2s-sparse approximation to x. 



2s I 



2s I 



/2s 



(7) 



Proposition 3. Under the conditions of Theorem\^and given measurements y = <i>x + e where 
X is s-sparse for s ^ s and $ G jj'^x" fiag fijp such that either condition (a), (h) or (c) is 
satisfied 



(a) -f^ 

(b) l< 
(c) 



^ — -, and j5s+2S < or 

0s+2s - Os+s + 1 V32 

and 7(1 - 6s+s) ^ 1 - 
and Ss+s < 1, 



Ss+2S — 3s+s + 1 

3 1 



or 



< 7 < 



4(1 - <5s+s) ' 1 - <5s+ 
then Algorithm[J\will recover an approximation x^*"'^ satisfying 



X - x(^) 



2-^||x||2+47v/l + <5s+s||e|l2 



(8) 



Proof. Conditions (a) and (b) come from Lemma 2] and condition (c) follows by setting a = ^ 
in Lemma [S] □ 
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We will first prove Lemmas |3] and [SI hence proving Proposition [3] and then use this to prove 
the main theorem. 



Lemma 4. Under the conditions of Theorem\^ and given measurements y = $x + e where x is 
s-sparse for s ^ s and $ G R'"^" has the RIP such that either condition (a) or condition (b) is 
satisfied 

(a) 7 ^ J T —r, and jSs+2s ^ or 

(>s+2s - Os+s + 1 V32 



(b) 7<T T —r,andj{l-Ss+s)^l-^=, 

(>s+2s - Os+S + 1 V32 

then Algorithm[J\wiU recover an approximation x^''^ satisfying 



s^2-'=||x|l2+47v/r+^||e||2. (10) 

To prove Lemma 21 we need the following lemmas. 

Lemma 5. Suppose that $ G M™^" obeys the restricted isometry property of order s with value 
Sg. Then for any set of indices T such that \T\ ^ s the singular values of $r He in the range 
— 5s, \/l + Sg]- Furthermore the eigenvalues o/<f>J$r — I Hg in the interval [—5s,6s]. 

Proof. Recall that the RIP says that for all vectors x' G R* for t ^ s 

Ilx'lla 

Furthermore we know that the Rayleigh quotient for a matrix A is bounded by the minimum 
and maximum eigenvalues of A, i.e. Amin(^) ^ Pa{^) ^ Amax(^) for all x. Set A — ^f^r, 
then the eigenvalues of A are the square of the singular values of $r- As the eigenvalues for A 
are bounded by 1 ± (5^ we have the corresponding bound for the singular values of <I>r, namely 
(T($r) G — Ss, Vl + Ss]- It follows then that the eigenvalues of $J$r — I lie in the interval 

[-Ss,Ss]. □ 

Lemma 6. For all index sets F and all measurement matrices <i> for which the RIP holds with 

s^\r\ 

||(l-$f$r)xr||2 «;(5,||xr||2. (11) 

and furthermore 

||(l-7$f$r)xr||2 [l-7(l-<5s)]||xr||2. (12) 

Proof. The RIP guarantees us that the eigenvalues of <i>f <i>r lie in the range l — Sg to l+Sg. Hence 
the matrix I — $J<I>r has eigenvalues in [— i5s,5s]- For the second result we proceed similarly. 
Clearly 7$J$r has eigenvalues in the range [7(1 — Ss), 7(1 + (5s)], hence the maximum eigenvalue 
of I — 7<i>J<i>r is less than or equal to 1 — 7(1 — 6s). 
Then for each fixed F we have 

||(l-7$f$r)xr||2 ^ ||(l-7$f$r)||2 • llxrila 
^(l-7(l-'5,))||xr||2, 

and the first equation follows by setting 7 = 1. □ 
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Lemma 7 (Based on Lemma 2 of [5]). Let F and A be two disjoint index sets for the matrix $. 
Then for all $ for which the RIP holds with s = jP U A| 



|$J$axa|| Ss ||xa|12 ■ 



(14) 



Proof. Set r2 = r U A. Since T and A are disjoint, the matrix (— $J$a) is a submatrix of the 
matrix I — 'i's'j'&jj • Since the largest singular value of a submatrix is bounded above by the largest 
singular value of the full matrix, we have 



l^r^AlL III - 



Ss 



as the eigenvalues of I — 'i'^^'n are bounded above by Ss- Hence 

||$J$axa||2 < II^f^aIIj • IIxaIIz 

< llxAlla , 

completing the lemma. 

We now have the necessary results to prove Lemma |H 
Proof of Lemma^ Note that this proof follows parts of the proof in [6j. First put 

r« ^x-x«, 

x(*) ^H,(a«), 
F* = supp(x), 
F(') ^supp(x(*)). 

As a consequence we have |F*| ^ s and [F^'^I ^ s. 
Consider the error ||x* — x^'^^^ [j^. Now we have 



□ 



X = X = xr* = xgd) 



and 



.(0 



,{'0 



Applying the triangle inequality we get 



x^ - x('+i) 






2 



(i+1) _ Ji+l) 



(18) 



As x'^*+^) is the thresholded version of a^'^^' it is the best s-term approximation to a'^*+^\ in 
particular it is better than x^. Hence 



and thus (fTSll becomes 



s$ 2 



^B(i+l) 
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Using the fact that y = <I>x* + e and r^*^ = x" — x^*^ we get 



.+1) + 7^B(.+i) (y - $x^i+ 



'■B(i+i) 



hence 



^ 2 



<2 
= 2 

< 2 



(l-7$T(,+i,$B('+i))r^Ui) - •■• 
(l-7*B(.+i)*s('+i))r^Ui) 



■27 



B(i+i)<&S(»)\B('+i)J'B(i)\B(i+ 



■27 11$]; 



B(i+i)«||2 I 



(20) 



by repeated appHcation of the triangle inequahty and by sphtting the residual into t"wo parts, 
_ , Then 



^ s + 2s 



as each set F*^*^) has at most s entries and |r*| ^ s. Recall from the RIP and Lemmas [6] and [7] 
that 



||$Ix||2s=:(l + <5,)||x||^, 
(l-7<i>I$A)xA||2 s;: (i-7(1-^.))I1xa| 



2 ' 



(21a) 
(21b) 
(21c) 



for all matrices <i> -which obey the RIP and sets A, r2,ri', where 51,^1' are disjoint, |A| = s and 
|r2 U rj'l = s. We also have 6s ^ Ss' for all positive integers s ^ s'. Applying (j21bp to the first 
term in (PO)) . and applying (|21cp and (|21ap to the second and third terms respectively, we get 



^2(l-7(l-<5,+,)) 



■_B(> + i) 



27<5, 



s+2s 



'S(')\B('+1) 

+ 27v/l + 5s+j||e||2. 



(22) 



The vectors r^|i+i) and are orthogonal as they have disjoint supports. Now let 

u, V G R" be two orthogonal vectors, then 

I|U|12 + I|V||2< V2||U + V||2. 



We use this to bound the sum of the two terms 



(i) 



and 



Si) 
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We first ask, how do the terms 1 — 7 (1 — Sg+s) and 7(5s+2s compare, given that Sg+s ^ Ss+2s'^ 



We then have either 

1 - 7 + 7'5s+s > lSs+2s <=^ 7 < 



Ss+2s — Ss+s + 1 ' 
1 



Ss+2s — 5s+s + 1 

• Case 1 — Equation (|23ap : Equation (|22p becomes 



^2V27(5,+2s r« +27Vl + ^3+j||e||2. 



Then if 



we have 



Hence 



provided that 



1 

< - 

2 2 



.(0 



+ 27VT+X^||e|l2. 
s;2-^-||x||2 + 47VTT^|le||2, 



7> 



lSs+2s ^ 



Ss+s + 1 ' 



and 



0.177. 



32 



Case 2 — Equation (|23bp : Equation becomes 



< 2^2(1 - 7(1 - r« +27yrT^|le|l 



If 



2^/2(1 - 7(1 - <5,+,)) ^ 1 ^ 7(1 - Ss+s) ^ 1 - ^ 



(Js+J ^1 



1 1 87 - 8 + V2 



we again have 



Hence 



provided that 



-('■+1) 



1 

< - 
2 2 



7 7V32 
+ 27VT+XTI||e||; 



87 



7 < 



s^2-^-||x||2 + 47v/l + <5.+d|e||2, 



->s+2s — i-'s+s 



and 



7(1 - Ss+s) ^ 1 



32 



0.82, 



(23a) 
(23b) 
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Putting these two results together we have 

1 



-(0 



+ 27v/l + <5.+dle|| 



2 ' 



if either of the following conditions (a) or (b) are met 

1 



(a) 
(b) 



7^ 
7 < 



Ss+2s — ^s+s + 1 
1 



and j6s+2s ^ 



Ss+2s ~ ^s+s + 1 

completing the proof of the lemma. 



and 7(1 - 5s+s) ^ 1 



/32 



(27a) 
(27b) 
□ 



Looking at our algorithm in another way shows that the MFR algorithm is capable of attaining 
the bounds of Lemma[TJ Lemma[l]says that given y = <i>x, then the minimiser to ||x||q subject 
to y = <f>x is unique and equal to x if (^23 < 1- Lemma [5] and its corollary shows that the MFR 
algorithm can achieve this bound. 

Lemma 8. Let x e jj'nx" ftg an s-sparse vector and let the matrix 3> G M™^" have RIP constants 
Ss and assume we are given the measurements y = <l>x + e. // 

1 



1 - - 
^ 2 



< 7 < 



1-6. 



1 - Ss+s 

for some constant < a < 1 and s ^ s, and 

Ss+s < 1, 

then the MFR algorithm produces an approximation x*^*^') satisfying 



X - xC^) 



^ a'' llx 



27 ym 



2 . . _ 1 — a 

Proof. Recall the fundamental step of the MFR algorithm 



1^^ (j 



(28) 
(29) 
(30) 
(31) 



As before, set B'-''^ = supp(x) U supp(x('')) and a(''+^) = xC") +7$^ (y - $x('''). Then we have 



(fc+i) _ (fc) 



7*R('.+i) (y- *x 



,(fe) 



x*„^(Ui, +7$]j(.+i)$(x - xf'^)) +7$]j(,+i,e. 



This gives the error estimate 



„(fc+l) 



2 

= 2 

<2 

= 2 

^2(1-7(1-5,+,)) 



ffc+i) 



X - XbI + ^) - 7*B(fc+i)^ (x - X^gl+i^ ) - 7*B(fc + i 



X - x^*^,i+i) - 7$]^(fe+i)$ (x - x'*^(i+i 



2 ||7$"^e| 



(I - 7$"^$) ( X - X 



27||*B(fe+i)e|| 



x-xW 



+ 2jy^l + 6s+s\\e\\2, 
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by Lemmas [H] and [7] This imphes that 









2 



2{l-^{l-Ss+s))] ||X||2 + 



2 ' 



Since X 



l-2(l-7(l-^s+.0) 
= 0. Thus if < 2 (1 — 7 (1 — Ss-\-s)) ^ a < 1 then the algorithm wiU converge. Since 
< 2(1 - 7(1 - Ss+s)) ^ a ^ 1 - - ^ 7(1 - S,+s) < 1 - Ss+s 



1 



< 7 < 



1 



1 — Ss+s 1 — Ss+s ' 

the algorithm will converge provided Ss+s < 1 producing an approximation that obeys 



1 — a 



completing the lemma. 



□ 



This lemma says that for the right value of 7 and provided that Sg+s < 1, the algorithm will 
always converge to the correct solution, at the cost of noise amplification. 

Corollary 9. Under the hypothesis of Lemma\^ and if we could measure the signal y exactly, 
i.e. if e = 0, then for some value of ^, setting s = s gives an algorithm capable of attaining the 
bound in Lemma\^ 

Proof. Setting s = s in (|29p gives the condition 823 < 1 and by choosing 7 so that 

1 



1-2. 
2 



1 



7 < 



1 



Js+s 



then the algorithm will produce a sequence of approximations which converge to x. 



□ 



What is interesting about these results, is that they rely only on the lower RIP constant, 
that is, it only requires 



< 1 



l$x|| 



for all 2s-sparse vectors x. This means that the MFR algorithm will recover the correct s-sparse 
solution X to y = <i>x provided that there are no 2s-sparse vectors in the kernel of $. In fact 
this is a tight theoretical bound: assume that there exists a vector v e M" that is 2s-sparse and 
<I>v = 0. Choose a set r C {1, 2, . . . , n} of size s so that F C supp(v) and set x = — vr so that x 
is s-sparse. Then 

y = <f>x = (f)x + cE>v = <I)(x + v) = $u, 

where u = x + v is s-sparse and u 7^ x. Hence there is no unique minimiser to ||x||g subject to 
y = <i>x and no algorithm will be able to return the correct solution 100% of the time. Thus the 
MFR algorithm is able to achieve the theoretically maximum performance. 

Perhaps even more surprising is that the algorithm will converge as fast as we want (but 
still linearly), i.e. ||x — x'^'^^ [[^ ^ a*^ ||x||2 for any < a < 1, provided we can choose 7 so that 

1 - f < 7(1 - S2s) < 1. 

This seems to be an astounding result, until we realise that this requires accurate values of 
S2S and explicitly calculating S2S for a random matrix is computationally equivalent to directly 
solving the £0 minimisation problem argmin ||x||q subject to y = $x. 

We introduce one final lemma before proving the main theorem. 
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Lemma 10 (Reduction to sparse case, Lemma 6.1 of [IT])- -^^^ x be a vector from R" and 



assume that <f> obeys the RIP of order t, then the sample vector y 
as y ^ $x* + e where 



\\e\\, ^ VlT^ Mix 



til 1 II 
x' L + — X ■ 

"2 Vi" 



<I>x + e can also be written 



(35) 



for any < G Z. 

We now use Proposition [3] and Lemma [10] from [27] to prove the main theorem. This uses 
the same techniques as in [6]. 

Proof of Theorem [3 Let x* be the best s-sparse approximation to x. Then observe that 



rik) 



,(fe) 



We then apply the algorithm to y to recover an s-sparse approximation. From Proposition [3] we 
get the bound 

s$2-''^||x^||2+47v/l + <52s||e|| 



where e = y — ^x" . Using Lemma [TU] and setting t = 2s we can write y — ^x'^* 
is a best 2s-sparse approximation to x, such that 



e where x^* 



3II2 < v/TT^(^||x-x2^| 



1 



,2s I 



/2s 



Hence combining Lemma [TU] and Proposition [3] we get 



X 



(k) 



2-^-||x'^||2+47Vr+^||e||2 + 



+ 47(1+ 52s) 



under conditions (a), (b) or (c). This completes the proof of the theorem. 



□ 



Note that in condition (a), namely (|27ap . setting 7 = 1/(1 + 6s) and s = s gives the same 
conditions on convergence for the proof of the IHT algorithm in [B]. Our result however gener- 
alises and shows that decreasing the step length can compensate for larger i5 values. We choose 
to use the RIP rather than the modified- RIP, unlike the authors of [S], as it leads to easier im- 
plementation. To implement IHT one requires a measurement matrix scaled by 1/(1 + 6s), but 
it is unfeasible to perform this operation exactly. Hence by avoiding this scaling and choosing a 
deliberately smaller 7 (which admittedly disguises some of the difficulty in this scenario) we can 
much more easily implement our algorithm. 

Observe also that this theorem implies that as 7 0, the error due to the noise component 
in the model also goes to 0. 

Strictly speaking, there is no real reason to require ||r^'^+-'^^ II2 ^ Hr*^*^) + 2^^/l 



62s ||e| 



2' 



any value ^ a < 1 with ||r(*''+-'-) Ij^ ^ a ||r^*''''||2 



|e||, would suffice, but perhaps 



offer significantly slower convergence. What happens to the convergence conditions if we allow 
a larger a? 
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Assume we have ||r'^'^+-'-) 11 ^ a ||r'^'^)|| + 27vT+"^2l ||e||,, then it follows that 



^ a 



+ 27^1 + ^2. ||e|| 



,(0) 



+ 2jVl + 62s\\e\\,Y.a' 



< a 



.(0) 



1-a 



which unfortunately threatens significant noise amplification, especially as a gets close to 1. 
Then the convergence criteria become 



(a) 7^ 

(b) 7< 



1 a 1 
— , and jSs+2s ^ -1= < -1= ~ 0.35, or 

Os+2s - Os+s + 1 V8 V8 

1 a 1 
— , and 7(1 - .5,+,-) > 1 - ^ > 1 - ^ « 0.65, 



which are slightly looser than before. 



3.3 Convergence 

If Algorithm [T] converges, it either converges to the correct sparse solution, or it converges to 
another sparse vector, but one that is not an (approximate - to some level of tolerance) solution 
to the equation y = <f>x + e. If the algorithm converges to an incorrect vector x, it is simple to 
test this since ||$x — yjjj ^ \\^\\2^ and, if necessary, rerun the algorithm with different 7 or s 
estimates. 

Suppose that the algorithm converges to x and that we know y exactly, i.e. we are investi- 
gating an equation of the form y = $x. Note that the solution returned by the algorithm, x, is 
s-sparse. Let *P G K"^" be a diagonal matrix that is the projection matrix onto the components 
specified by the support of x, i.e. onto the non-zero components of x. Observe that we have 

= *P = and *Px = X. Then the solution x obeys 

X = *P [x + 7$"^$ (x - x)] 
^ <P$"^$x = «P<I>"^$x 

= <p"^$"^$«Px. 

Recall that the Moore-Penrose pseudo-inverse of a matrix A is given by, if it exists, A^ = 
{A*A)~'^ A*. Let A = $<p, then x obeys 

A"^$x = A'^Ai. 

If we can invert A^ A, then x is given by 

i = {A'^Ay^A^'^x = ($«P)^ $x. 

This result is not at all surprising, it merely confirms that if we knew the support of x, then we 
could find the solution by solving the least squares problem on this support. 



4 Modifications 

We propose several modifications to the MFR algorithm that increase both the success rate and 
the rate of convergence. 
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Algorithm 2 MFR with Polynomial Convergence 



Input: 

• The measurement matrix $. 

• Observation vector y. 

• Estimate of sparsity s of the vector x. 

• Step size 7. 

• Tolerance parameter etoi- 
Output: 

• A vector x e M" that is s-sparse. 

1: ^ 1 

r, „ ^ '^g,ax(f)-<,n(1') 



8: 
9: 
10 
11 



x(o) ^ 

k ^ 1 

vk^hile ||x('^) — x^'^"^) ^ £to/ do 
^(fc+i) ^ 1 



end while 
return x'*^' 



4.1 Accelerated Polynomial Convergence 

We can easily apply the methods from Section 12.31 to increase the convergence rate of our algo- 
rithm to get Algorithm [21 

4.2 Least Squares 

Another method to speed up convergence of the MFR algorithm, is to add a least squares 
minimisation step. The algorithm produces a sparse approximation to the solution, which we 
use as a method of selecting the columns of the solution that contain the non-zero data points. 
On this set of columns, say F, we then solve the least squares problem 

arg min ||y — $z||2, (39) 

z: supp(z)— r 

which has a convenient closed form solution. As this algorithm is deterministic, solving this 
problem twice when restricted to the same support, gives the same solution. Hence we will run 
the update step until the support of the largest s components changes, and then solve the least 
squares problem restricted to this support. The full algorithm is given as Algorithm [31 

Furthermore, we can combine the polynomial variant of this algorithm with least squares. 
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Algorithm 3 MFR with Least Squares 



Input: 

• The measurement matrix $. 

• Observation vector y. 

• Estimate of sparsity s of the vector x. 

• Step size 7. 
Output: 

• A vector x e M" that is s-sparse. 



1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
10 
11 
12 
13 
14 
15 
16 
17 



x(o) ^ 

r(o) ^ 

while ||x(''' -xC'^i^ll^ ^ etoi do 
X ^ xf"') 

X ^ X + 7$''' (y - $x) 

X <— lElIs(x) > Prune 

r <— supp(x) 

if r ^ r(o) then 

xC^+i) ^ $t y > Solve LS on T 

r(o) ^ r 

else 

end if 

fc<-fc + l 
end while 
return x'*^') 



4.3 Adaptive Step-length 



So far we have only considered a step-length, 7, of fixed size, but inherently there is no reason 
why we cannot vary 7 from one iteration to the next. Algorithms with a variable step- length are 
certainly well known, such as in soft-thresholding [IG] and gradient pursuit methods [5.. 

What we propose is a greedy strategy, so that at each iteration the step length 7^'^) is chosen 
so as to minimise a certain quantity. One obvious choice is to minimise the £2 norm of the 
residual, so that we would choose the step- length 7'*^^ to minimise ||x — x'*^' 11 , i.e. 



7 



(fe) 



arg mm 

7^0 



x-H,(x('^-i)+7^'^'i>^ (y-$xW)) 



Unfortunately, since we do not know x, we cannot calculate this quantity. An alternative then 
is to minimise the residual in <i>-space, i.e. ||y — ^x^'^^H^, so at every iteration we choose 7*^*^^ so 
as to minimise 



y - $H,. ( xC^-i) + 7^^)$'^ ( y - ^x'*^' 



(40) 



This gives us Algorithm [H 
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Algorithm 4 MFR Algorithm with locahy optimal 7 
Input: 

• The measurement matrix $. 

• Observation vector y. 

• Estimate of sparsity s of the vector x. 

• Tolerance parameter etoi. 

Output: 

• A vector x e M" that is s-sparse. 

1: x(") ^ 

2: fc ^ 1 

3: while ||x('') -x^'^^^^ll^ ^ etoi do 

4: 7^'') <— argmin^ ||y - (x^'') + 7$''' (y - $x(''))) 
5. xf'^+i) ^ H^x^*^) +7$T (y _ ^x^*^))) 
6: k ^ k + 1 

7: end while 

8: return x'*^) 



This algorithm has two properties, firstly the £2-iiorm of the residual is a non-decreasing 
function, as shown by the following lemma. 

Lemma 11. Let $ e ^"ixn ^ measurement matrix that obeys the RIP of order s and let 
X S K" be the s-sparse signal we are trying to reconstruct given the measurements y = $x. 
Using the MFR algorithm with adaptive step-length, Algorithm^ where the step-length, is chosen 
at every iteration so as to minimise the residual in ^-space, then the £2-norm of the residual is 
a non-increasing function of the number of iterations. 

Proof. Clearly if 7^^) = then we have x^^^ = x^^^^) hence H^r^'^^H^ = H^r^^"^) [j^. Thus 
setting 7('^'' = does not increase the norm of the residual, hence minimising this for 7 > can 
only further decrease 

Although Lemma [Tl] does not guarantee convergence to a vector x, it promises convergence 
to a vector $x that lies on a sphere around the measurements y. 

Secondly, provided that 7'^'^^ meets the conditions of Theorem [5] then convergence is at least 
as fast, as the theorem does not depend on the particular value of 7 used at every iteration. 

5 Simulation Results 

In this section we simulate a number of the algorithms we have discussed. All simulations use 
measurement matrices with Gaussian entries and the data signals were generated by randomly 
choosing a support, and then generating Gaussian entries with variance 1 for the non-zero compo- 
nents. All algorithms were terminated when the change in output from one iteration to the next 
was less than 10~^ using the £2 norm. We classed a simulation run as successful if it produced 
an output that had the correct support. 
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Table 1: Percentage of simulations that resulted in the correct vector being returned by the MFR 
algorithm with least squares for a matrix <I> g K^o^^oo^ Numbers in bold give the maximum for 
that column. We see that we obtain a higher success rate if we underestimate the sparsity of the 
original signal. 



Estimated 


True Sparsity 




aPARfelTY S 


4 


8 


12 


ID 


4 


26% 








8 


94% 


11% 






12 


93% 


51% 


5% 




16 


96% 


55% 


7% 


0% 


20 


95% 


53% 


10% 


2% 


30 


84% 


26% 


0% 


0% 


40 


41% 


6% 


0% 


0% 


Total 


100% 


79% 


17% 


2% 



5.1 Success Rate 

In Figures [l]l2] we plot the success percentages when simulating some of the algorithms mentioned 
in this paper. In both cases we see that the MFR algorithm and its variants outperforms both 
ii minimisation and the CoSaMP algorithm. The MFR algorithm with least squares (MFR + 
LS) is the best performing algorithm. However what this graph does not capture is the speed 
and number of iterations required for convergence, which can be seen in Figure [31 

In Table [T] we can see the advantage of choosing a sparsity estimate s that is strictly larger 
than the true sparsity of the signal we are trying to reconstruct. In every case, the single best 
sparsity estimate is one that is larger than the real sparsity. This illustrates that it is in fact 
an advantage to overestimate the sparsity. This is also significant as if we underestimate the 
sparsity, the algorithm cannot succeed. 

5.2 Convergence Rate 

In Figure [3] we plot histograms of the number of iterations required before convergence for the 
three variants of our algorithms. We generated one thousand random Gaussian matrices <& and 
one thousand sparse data vectors x, so that each algorithm was fed the same input. We see clearly 
that the two modified versions converge significantly quicker than the plain MFR algorithm, and 
that the algorithm incorporating least squares converges faster still. 

For a plot showing how the step-length affects the rate of convergence, see Figure HI 
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Measurement matrix: n = 400, m = 150 



10" 



10" 



CO 



10" 



40 




CoSaMP 

— MFR 
-e— MFR + LS 
■^A— MFR Adapt. 

— MFR Accel 
-a — MFR Accel + LS 



55 
Sparsity 



Measurement matrix: n = 400, m = 200 




Figure 1: Success rates for simulating a number of algorithms with n = 400 and m = 150 (top) 
and m = 200 (bottom), using £i minimisation, CoSaMP, MFR and its variants. In both cases 
MFR with Least Squares outperforms all other algorithms. 
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Measurement matrix: n = 800, m = 300 




Measurement matrix: n = 800, m = 400 




140 



150 



160 170 
Sparsity 



180 



190 



200 



Figure 2: Success rates for simulating a number of algorithms with n = 800 and m = 300 (top) 
and m = 400 (bottom), using £i minimisation, CoSaMP, MFR and its variants. In both cases 
MFR with Least Squares outperforms all other algorithms. 
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Measurement matrix: 200x400 




Figure 3: Histogram of the number of iterations required for convergence for the MFR algorithms. 
From top to bottom: MFR, MFR with polynomial acceleration, MFR with adaptive step-length, 
MFR with least squares and MFR with polynomial acceleration and least squares. The vertical 
dashed line shows the mean number of iterations required before convergence. 
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6 Comparison to Previous Algorithms 



Here we present a detailed comparison of the MFR algorithm and its variants to several other 
reconstruction techniques. 

The algorithm we have presented generalises the previous IHT algorithm. We need to be 
careful in directly comparing the two algorithms as they use slightly different measurement 
matrix structures. The IHT algorithm assumes a scaled matrix $, that is, let <& € ]g™x" have 
RIP constant ds of order s, then the IHT algorithm reconstructs x given the measurements 

whereas we operate directly with the original matrix <&. This means that by setting 7 = , 
dividing the measurements y by 1 + (5s and putting s = s in our algorithm, the two algorithms 
are equivalent. 

The advantage of allowing 7 to to be variable is phenomenal. We have already shown that by 
allowing 7 to be smaller we can dramatically improve the reconstruction rate of the algorithm 
with low sparsity inputs. Alternatively with very sparse inputs we can increase the convergence 
rate by choosing the value of 7 to be larger. Proposition [3] says that, under the hypothesis of 
the theorem, the error in the final signal is bounded by the term 471/1 + S2S \\^\\2 where e is the 
error in measuring the signal. Hence by taking a small value for 7 we can decrease the effect of 
the error in the final output. 

In comparison to the analysis of IHT, Theorem [2] offers a slightly better convergence result. 
Setting s = s and 7 — jqij- yields the identical condition to the main Theorem of [6], our 
analysis says that provided 7 J5 1/ {5s+2s — Sg+s + 1), if Ss+2s > 1/a/32 then we can decrease 7 
and still get convergence. Alternatively, if 6s+2s < l/\/32 then we can increase 7 to get faster 
convergence, provided we could estimate the quantities Ss+s and 6s+2S accurately. The only way 
to do this currently, is to check all ( ~) submatrices (s = s + s 01 s = s + 2s a.s appropriate) of 
which is computationally unfeasible and is in fact computationally equivalent to directly solving 
the original £0 problem, argmin ||x||q subject to y = $x, directly. We see in Figure[l]how the 
rate of convergence increases dramatically with 7, which is what we predicted from Theorem [21 

By requiring a scaled measurement matrix, the implementation of the IHT algorithm becomes 
complicated. There is no known way to generate measurement matrices with particular RIP 
constants, although we can say with high probability they are bounded by a certain value, but to 
implement the IHT algorithm requires the matrix to be scaled by 1 + J^. Hence we must cither 
estimate Sg, which is not discussed in the original work, or calculate it explicitly, which means 
we might as well directly solve the Iq problem. 

We have also proposed adaptively choosing 7*^'^' at every iteration. We have shown that doing 
so can dramatically decrease the error rate in reconstruction. It is ongoing work to see if there is 
a good way to estimate a better 7 in terms of minimising the residual, without directly knowing 
the residual. 

Another important difference to the IHT algorithm is that we discuss what happens if we 
threshold with a value that is strictly larger than the sparsity of the signal we are trying to 
reconstruct, that is, we keep more than s components of the signal non-zero. We often see that 
choosing a larger thresholding value dramatically improves the success rate at the expense of 
increasing the number of iterations required for convergence. 

The algorithm we propose also appears on the surface to be very similar to the Gradient 
Pursuit algorithms discussed in [5]. Indeed, the directional update in gradient pursuit is the same 
as for both IHT and MFR, but the big difference is in how the sparsity constraint is enforced. 
For the gradient pursuit algorithms, a new dictionary element is added at every iteration, and 
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Figure 4: Number of iterations required before convergence using the MFR algorithm and differ- 
ent (fixed) step-lengths. This plot shows the quartiles for 7 e {0.05, 0.1 ... , 0.7}. Observe that 
as 7 increases the number of iterations required for convergence decreases, but the corresponding 
probability of success eventually falls. 



once added, cannot be removed. In contrast, IHT and MFR make use of a pruning step, so at 
every iteration we keep only the most important (decided by the largest magnitude) dictionary 
elements, thus elements can be both added and removed. 

More recent work in [S] uses the same paradigm as we do, i.e., selecting a possibly new support 
at each iteration, where the step- length is chosen to be the negative gradient of ||y — ^x'^-'^H^, 
however their analysis is somewhat different to ours. 

Incorporating a least squares step into the MFR algorithm makes it look similar to the 
CoSaMP algorithm in [27]. There are several significant differences though. 

The CoSaMP algorithm solves the least squares problem over the vectors that have support 
size 3s where s is the sparsity of the true solution. The MFR algorithm typically considers 
a much smaller support size. The column or support selection for the CoSaMP algorithm is 
performed by merging the support of the the previous iterate and the "signal proxy" , that 
is ^^^r^''^ — C'^tE' (x — x'*^') . In the case of the MFR algorithm, the frame reconstruction 
algorithm is used to select the support on which to solve the least squares problem. 

Our work also addresses an area not explicitly discussed previously. We frame our reconstruc- 
tion results in terms of the true sparsity of the signal and an estimate of the sparsity. Indeed we 
show via simulation that it can in fact be of benefit to undcr-cstimate the sparsity of the signal 
we are trying to reconstruct. 
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7 Conclusion 



In this paper we have demonstrated an iterative hard thresholding algorithm inspired by the 
frame reconstruction algorithm. In this work we have generalised the results of [B] and have shown 
how they can be used to guarantee convergence in a wider range of circumstances. Furthermore 
our algorithm significantly outperforms the IHT algorithm at larger s values (where s is the 
sparsity of the signal we are tying to reconstruct). At these points however, we operate beyond 
the range of values for which the theorem is applicable. That is, the 6 and 7 parameters do not 
satisfy either of our conditions because the sparsity level s of x is too high. However we see from 
out simulations that even though the algorithms are not guaranteed to converge to the correct 
solution, they still often do. 

We demonstrate several modifications to our algorithm; incorporating polynomial accelera- 
tion, adding a least squares step and using an adaptive algorithm to select the step-length. These 
modifications increase the performance, that is the rate of successful reconstruction and the rate 
of convergence. 

For the least squares modification we show analytically that the algorithm will converge 
under the same requirements but possibly slightly slower, but empirically we observe that the 
algorithm in fact converges significantly faster. For the adaptive step-length modification we 
show that provided the step-length satisfies the requirements of Theorem [5] the algorithm will 
converge at least as fast. 
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